knitr::opts_chunk$set(echo = FALSE, message = FALSE)
library(Seurat)
library(ggplot2)
library(data.table)
library(MAST)
library(SingleR)
library(dplyr)
library(tidyr)
library(limma)
library(scRNAseq)## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scRNAseq_2.2.0 limma_3.44.3
## [3] tidyr_1.1.1 dplyr_1.0.2
## [5] SingleR_1.2.4 MAST_1.14.0
## [7] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.2
## [9] DelayedArray_0.14.1 matrixStats_0.56.0
## [11] Biobase_2.48.0 GenomicRanges_1.40.0
## [13] GenomeInfoDb_1.24.2 IRanges_2.22.2
## [15] S4Vectors_0.26.1 BiocGenerics_0.34.0
## [17] data.table_1.13.0 ggplot2_3.3.2
## [19] Seurat_3.2.0
##
## loaded via a namespace (and not attached):
## [1] AnnotationHub_2.20.1 BiocFileCache_1.12.1
## [3] plyr_1.8.6 igraph_1.2.5
## [5] lazyeval_0.2.2 splines_4.0.2
## [7] BiocParallel_1.22.0 listenv_0.8.0
## [9] digest_0.6.25 htmltools_0.5.0
## [11] magrittr_1.5 memoise_1.1.0
## [13] tensor_1.5 cluster_2.1.0
## [15] ROCR_1.0-11 globals_0.12.5
## [17] colorspace_1.4-1 blob_1.2.1
## [19] rappdirs_0.3.1 ggrepel_0.8.2
## [21] xfun_0.16 crayon_1.3.4
## [23] RCurl_1.98-1.2 jsonlite_1.7.0
## [25] spatstat_1.64-1 spatstat.data_1.4-3
## [27] survival_3.2-3 zoo_1.8-8
## [29] ape_5.4-1 glue_1.4.1
## [31] polyclip_1.10-0 gtable_0.3.0
## [33] zlibbioc_1.34.0 XVector_0.28.0
## [35] leiden_0.3.3 BiocSingular_1.4.0
## [37] future.apply_1.6.0 abind_1.4-5
## [39] scales_1.1.1 DBI_1.1.0
## [41] miniUI_0.1.1.1 Rcpp_1.0.5
## [43] viridisLite_0.3.0 xtable_1.8-4
## [45] reticulate_1.16 bit_4.0.4
## [47] rsvd_1.0.3 htmlwidgets_1.5.1
## [49] httr_1.4.2 RColorBrewer_1.1-2
## [51] ellipsis_0.3.1 ica_1.0-2
## [53] pkgconfig_2.0.3 uwot_0.1.8
## [55] dbplyr_1.4.4 deldir_0.1-28
## [57] tidyselect_1.1.0 rlang_0.4.7
## [59] reshape2_1.4.4 later_1.1.0.1
## [61] AnnotationDbi_1.50.3 munsell_0.5.0
## [63] BiocVersion_3.11.1 tools_4.0.2
## [65] generics_0.0.2 RSQLite_2.2.0
## [67] ExperimentHub_1.14.1 ggridges_0.5.2
## [69] evaluate_0.14 stringr_1.4.0
## [71] fastmap_1.0.1 yaml_2.2.1
## [73] goftest_1.2-2 knitr_1.29
## [75] bit64_4.0.2 fitdistrplus_1.1-1
## [77] purrr_0.3.4 RANN_2.6.1
## [79] pbapply_1.4-3 future_1.18.0
## [81] nlme_3.1-148 mime_0.9
## [83] compiler_4.0.2 plotly_4.9.2.1
## [85] curl_4.3 png_0.1-7
## [87] interactiveDisplayBase_1.26.3 spatstat.utils_1.17-0
## [89] tibble_3.0.3 stringi_1.4.6
## [91] lattice_0.20-41 Matrix_1.2-18
## [93] vctrs_0.3.2 pillar_1.4.6
## [95] lifecycle_0.2.0 BiocManager_1.30.10
## [97] lmtest_0.9-37 RcppAnnoy_0.0.16
## [99] BiocNeighbors_1.6.0 cowplot_1.0.0
## [101] bitops_1.0-6 irlba_2.3.3
## [103] httpuv_1.5.4 patchwork_1.0.1
## [105] R6_2.4.1 promises_1.1.1
## [107] KernSmooth_2.23-17 gridExtra_2.3
## [109] codetools_0.2-16 MASS_7.3-52
## [111] assertthat_0.2.1 withr_2.2.0
## [113] sctransform_0.2.1 GenomeInfoDbData_1.2.3
## [115] mgcv_1.8-31 grid_4.0.2
## [117] rpart_4.1-15 rmarkdown_2.3
## [119] DelayedMatrixStats_1.10.1 Rtsne_0.15
## [121] shiny_1.5.0
In v2 of the analysis we decided to include the control mice from the Nbeal experiment with the Migr1 and Mpl mice. The thought is that it may be good to have another control, since the Migr1 control has irradiated and had a bone marrow transplantation. I’m going to split the Rmarkdown files into separate part, to better organize my analysis.
Here I will in include my efforts in to labeling the cell clusters in the integrated dataset. Planned analysis for labeling cell clusters:
Following along with previous analyses I’ve done and also the SingleR documentation.
## Cluster SingleR-comb SingleR-ref1 SingleR-ref2
## 1 0 Neutrophils Neutrophils Granulocytes
## 2 1 Granulocytes Neutrophils Granulocytes
## 3 2 Stem cells Stem Cells Granulocytes
## 4 3 B cells B cells B cells
## 5 4 Neutrophils Neutrophils Granulocytes
## 6 5 Monocytes Monocytes Monocytes
## 7 6 Basophils Basophils Granulocytes
## 8 7 Stem cells Stem Cells Monocytes
## 9 8 Macrophages Macrophages Macrophages
## 10 9 B cells B cells B cells
## 11 10 Erythrocytes B cells Erythrocytes
## 12 11 T cells T cells T cells
## 13 12 Stem cells Stem Cells Erythrocytes
## 14 13 B cells B cells B cells
## 15 14 <NA> B cells B cells
The above image shows how best to interpret the cell prediction. Clusters (columns) that have dark red rectangles, or have shaded rectangles featuring similar cell types (ie Neutrophils/Granulocytes) lend confidence to cluster labels. Clusters like 12 show many different labels, with NA cell types, shows a lower confidence in cluster labels.
## Cluster SingleR-comb SingleR-ref1 SingleR-ref2 SingleR_cell_comb
## 1 0 Neutrophils Neutrophils Granulocytes Neutrophil
## 2 1 Granulocytes Neutrophils Granulocytes Granulocytes
## 3 2 Stem cells Stem Cells Granulocytes Stem Cells
## 4 3 B cells B cells B cells B cell
## 5 4 Neutrophils Neutrophils Granulocytes Neutrophil
## 6 5 Monocytes Monocytes Monocytes Monocyte
## 7 6 Basophils Basophils Granulocytes Basophil
## 8 7 Stem cells Stem Cells Monocytes Stem cells
## 9 8 Macrophages Macrophages Macrophages Macrophages
## 10 9 B cells B cells B cells B cells
## 11 10 Erythrocytes B cells Erythrocytes Erythrocytes
## 12 11 T cells T cells T cells T cells
## 13 12 Stem cells Stem Cells Erythrocytes Stem Cells
## 14 13 B cells B cells B cells B cells
## 15 14 <NA> B cells B cells NA
Cell type specific marker gene expression. Genes were added to the list in two different ways: canonical markers that are well known in the field, and genes that distinguished clusters and were found to play a key role in specific cells.
Ighd: immunoglobulin heavy constant delta. Seems to clearly be expressed by B-cells, but still working on a good reference.
Gata2: From Krause paper: a transcription factor required for both lineages but bind in different combinations ref
Cd68: a human macrophage marker ref. A more general ref
Vcam1: found papers using Vcam1+ monocytes, but haven’t found a great reference.
Alas2: an erythroid-specfiic 5-aminolevulinate synthase gene ref
Gata3: plays a role in the regulation of T-cells ref
Vwf and Itga2b: I figure the reference would best be left to y’all.
Ly6g: from website it plays a role in monocyte, granulocyte, and neutrophil
Ngp: from uniprot “Expressed in myeloid bone marrow cells. Expressed in neutrophilic precursors (at protein level) (PubMed:8749713). Expressed in myeloid bone marrow cells (PubMed:21518852)”
Mmp8: neutrophil/lymphocyte collagenase link
## Warning: Could not find Cd38 in the default search locations, found in RNA assay
## instead
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: SCA-1
## Cluster SingleR-comb SingleR-ref1 SingleR-ref2 SingleR_cell_comb
## 1 0 Neutrophils Neutrophils Granulocytes Neutrophil
## 2 1 Granulocytes Neutrophils Granulocytes Granulocytes
## 3 2 Stem cells Stem Cells Granulocytes Stem Cells
## 4 3 B cells B cells B cells B cell
## 5 4 Neutrophils Neutrophils Granulocytes Neutrophil
## 6 5 Monocytes Monocytes Monocytes Monocyte
## 7 6 Basophils Basophils Granulocytes Basophil
## 8 7 Stem cells Stem Cells Monocytes Stem cells
## 9 8 Macrophages Macrophages Macrophages Macrophages
## 10 9 B cells B cells B cells B cells
## 11 10 Erythrocytes B cells Erythrocytes Erythrocytes
## 12 11 T cells T cells T cells T cells
## 13 12 Stem cells Stem Cells Erythrocytes Stem Cells
## 14 13 B cells B cells B cells B cells
## 15 14 <NA> B cells B cells NA
## markers
## 1 Granulocyte
## 2 NA
## 3 Granulocyte
## 4 B-cells
## 5 NA
## 6 Monocytes
## 7 Mast Cell/MEP
## 8 Granulocyte?
## 9 Monocyte/Macrophage
## 10 B-cells
## 11 Erythrocyte
## 12 T-cells
## 13 Megakaryocyte
## 14 Lymphocyte/Stromal Cell
## 15 Plasma Cell
Need documentation: see slack notes
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
Still isn’t completely clear. The thought is too limit the number of genes used to make this correlation. For these figures it was using the 2,000 most highly variable genes in the mouse dataset, and finding human orthologs in the the human bone marrow dataset. Some human genes had multiple mouse orthologs, and we took the first one in the list. This left us with 1,471 genes in the above comparison.
Of the 500 most variable in both mouse and human scRNA datasets (each starting with 1,471 genes), 181 overlapped between the two sets.
## Warning in FindVariableFeatures.Assay(object = assay.data, selection.method =
## selection.method, : selection.method set to 'vst' but count slot is empty; will
## use data slot instead
## Warning in eval(predvars, data, env): NaNs produced
## Warning in hvf.info$variance.expected[not.const] <- 10^fit$fitted: number of
## items to replace is not a multiple of replacement length
## Mode FALSE TRUE
## logical 273 227
Now just looking at the 500 most variable genes in the mouse data set.
## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.
Looking at genes that distinguish certain clusters from others, to help identify the cell type. This was partially done during the marker gene stage, but will be done more thoroughly in this section.
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Fcnb 0 1.7378054 0.953 0.324 0
## Chil3 0 1.6397866 0.995 0.619 0
## Hmgn2 0 1.4644989 0.997 0.796 0
## Hist1h2ap 0 1.2944567 0.874 0.355 0
## Ube2c 0 1.1577756 0.911 0.401 0
## Cebpe 0 1.1227218 0.997 0.498 0
## Mki67 0 1.0921853 0.991 0.394 0
## Top2a 0 1.0540932 0.950 0.350 0
## Birc5 0 0.9629361 0.963 0.375 0
## Tuba4a 0 0.9628674 0.992 0.642 0
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Ctsg 1.209808e-15 -0.7837163 0.393 0.322 2.419616e-12
## C1qc 3.637988e-15 -0.6959099 0.157 0.248 7.275977e-12
## C1qa 2.291724e-14 -0.9117860 0.186 0.273 4.583448e-11
## Pf4 4.540254e-14 -1.2180444 0.260 0.346 9.080509e-11
## Stfa2l1 1.651299e-13 -0.8231477 0.283 0.282 3.302597e-10
## Ccl5 9.203515e-12 -0.7810595 0.104 0.172 1.840703e-08
## Akr1c18 2.194479e-10 -0.7730957 0.245 0.240 4.388958e-07
## Jchain 3.281011e-09 -1.4884194 0.070 0.126 6.562022e-06
## Gm5483 1.644833e-08 -1.0157618 0.212 0.244 3.289667e-05
## Igha 1.116925e-05 -1.3644366 0.082 0.123 2.233850e-02
Top 10 * Fcnb: a gene expressed in granulocytes * Chil3: eosinophil chemotactic cytokine * Cebpe: may be essential for terminal differentiation and functional maturation of committed granulocyte progenitor cells * Mki67: marker of proliferation * Birc5: apoptosis inhibitor
Down 10 Many genes associated with lymphocytes (Igha, Jchain)
I think once again these aren’t truly stem cells but a granulocyte progenitor
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Elane 0 3.760702 0.980 0.352 0
## Prtn3 0 3.579206 1.000 0.447 0
## Mpo 0 3.448474 0.975 0.272 0
## Ctsg 0 3.076317 0.989 0.302 0
## Ms4a3 0 2.429491 0.980 0.358 0
## Rgcc 0 1.572454 0.961 0.413 0
## Gstm1 0 1.415298 0.975 0.370 0
## Prss57 0 1.300410 0.992 0.269 0
## Dmkn 0 1.234107 0.907 0.163 0
## Med21 0 1.156593 0.994 0.485 0
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Akr1c18 3.394719e-09 -0.7809194 0.130 0.245 6.789439e-06
## Prss34 5.547182e-09 -1.7616139 0.177 0.285 1.109436e-05
## Vpreb1 1.187639e-08 -0.8921897 0.051 0.139 2.375278e-05
## BC100530 1.306323e-08 -0.9247701 0.144 0.265 2.612647e-05
## H2-Ab1 3.483371e-08 -0.9871756 0.254 0.282 6.966742e-05
## Cpa3 1.508041e-07 -0.7044522 0.059 0.157 3.016081e-04
## Jchain 3.591692e-07 -1.4574552 0.042 0.123 7.183384e-04
## Igha 1.711388e-06 -1.3442218 0.045 0.121 3.422775e-03
## Ccl5 5.920506e-06 -0.7380559 0.085 0.167 1.184101e-02
## Ifitm1 1.034447e-05 -1.1345575 0.411 0.462 2.068895e-02
Top 10 Markers * Elane: creates a protein called neutrophil elastase * Prtn3: expressed in neutrophil granulocytes * Mpo: active during myeloid differentiation that consitutes the major component of neutrophil azurophilic granules. * Ctsg: found in azurophil granules of neutrophilic polymorphonuclear leukocytes.
A neutrophil cluster
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Gm15915 2.122751e-273 1.3437173 0.955 0.135 4.245502e-270
## Pf4 7.627217e-233 3.2489100 0.961 0.322 1.525443e-229
## Muc13 3.586159e-214 0.8234805 0.910 0.060 7.172318e-211
## Angpt1 5.369361e-200 0.7540279 0.933 0.067 1.073872e-196
## Meis1 8.788350e-196 0.7423735 0.899 0.076 1.757670e-192
## Mrvi1 1.523672e-184 0.8510940 0.927 0.107 3.047344e-181
## Ybx3 3.207742e-179 1.3989186 1.000 0.395 6.415484e-176
## Ncl 8.287229e-179 1.6625501 1.000 0.660 1.657446e-175
## Rab27b 7.655640e-166 0.7732915 0.826 0.042 1.531128e-162
## Srm 2.739026e-164 1.1343476 0.989 0.254 5.478051e-161
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Hbb-bt 2.793611e-14 -1.5069564 0.994 0.816 5.587222e-11
## Vpreb3 4.138232e-13 -2.1056835 0.096 0.261 8.276463e-10
## Gm5483 9.865494e-13 -1.2155142 0.416 0.237 1.973099e-09
## Pou2af1 9.986183e-13 -0.7748434 0.067 0.221 1.997237e-09
## Ly6d 3.939269e-09 -1.9298919 0.185 0.262 7.878538e-06
## H2-Aa 1.507854e-08 -0.8936143 0.343 0.213 3.015708e-05
## Iglc2 8.097041e-08 -1.8918088 0.146 0.138 1.619408e-04
## S100a4 2.239373e-07 -0.7541575 0.320 0.221 4.478746e-04
## Hbb-bs 2.460886e-07 -2.6505197 0.994 0.980 4.921772e-04
## Cd74 5.042619e-07 -2.0019411 0.270 0.325 1.008524e-03
Top 10 Markers * Pf4: Mks regulate the quiescence of HSCs through PF4 link * Mrv1: similar to Jaw1, a lymphoid-restricted protein whose expression is downregulated during myeloid differentiation, may be a myeloid leukemia tumor suppressor gene
This top 10 list isn’t very informative
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Mcpt8 0 4.979810 0.990 0.374 0
## Prss34 0 4.306974 0.958 0.231 0
## Ccl4 0 3.566765 0.892 0.271 0
## Ccl3 0 3.296363 0.973 0.284 0
## Ccl9 0 3.149800 0.980 0.289 0
## Ccl6 0 3.129451 0.970 0.654 0
## Cyp11a1 0 2.966787 0.997 0.213 0
## Ifitm1 0 2.915877 0.976 0.422 0
## Akr1c18 0 2.819709 0.809 0.199 0
## Cpa3 0 2.715900 0.770 0.107 0
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Vpreb1 8.391170e-23 -0.9112661 0.024 0.144 1.678234e-19
## Dnajc7 1.476353e-20 -1.2045237 0.508 0.593 2.952707e-17
## Jchain 3.277613e-20 -1.4765529 0.019 0.127 6.555227e-17
## Igha 3.366733e-18 -1.3640588 0.022 0.125 6.733465e-15
## C1qb 2.133267e-17 -0.7737691 0.118 0.268 4.266535e-14
## Pafah1b3 2.070846e-15 -0.8540048 0.312 0.349 4.141693e-12
## Apoe 4.356531e-15 -1.1461806 0.204 0.362 8.713062e-12
## S100a4 6.416181e-15 -0.7703960 0.113 0.231 1.283236e-11
## H2-Ab1 6.769553e-09 -0.9540405 0.220 0.286 1.353911e-05
## Ccl5 2.984116e-07 -0.7035290 0.090 0.169 5.968232e-04
Top 10 Genes * Mcpt8, Prss34, Cpa3: mast cell genes * Ccl: many chemokine ligand genes
###Conclusions
Seems like these are mast cells
Looking at the markers I used in “MEP Cluster Introspection” to maybe clarify some clusters.
Looking for MEP markers. Used this paper I picked genes that were expressed in in all there of their MEPs (Kit, Myb, Tgfb1, Cd44). They do a lot looking at MEP subtypes, and once we finalize the MEP cluster/population this would be interesting to look into.
## Warning: Could not find Cd44 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Cnrip1 in the default search locations, found in RNA
## assay instead
## Warning: Could not find Cd9 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Lox in the default search locations, found in RNA assay
## instead
## Warning: Could not find Nfib in the default search locations, found in RNA assay
## instead
## Warning: Could not find Dhrs3 in the default search locations, found in RNA
## assay instead
Looking at the Amit paper from 2015, for transciprtion factors which define one population from another.
## Warning: Could not find Mbd2 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Tcf3 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Cited2 in the default search locations, found in RNA
## assay instead
## Warning: Could not find Fli1 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Lmo4 in the default search locations, found in RNA assay
## instead
## Warning: Could not find Gfi1 in the default search locations, found in RNA assay
## instead